A Language Independent Approach To Acquiring Phonotactic Resources for Speech Recognition
نویسنده
چکیده
Building and developing linguistic resources for languages is of prime importance with many areas of application. This paper focusses on a fully automatic approach to the aquisition of a syllable phonotactics for a particular language. In this approach the phonotactic constraints for a language are encoded in a finite-state phonotactic automaton the structure of which can be automatically derived from an example set of well-formed syllables that may occur in the language in question. Such automatic acquisition of phonotactics is achieved through the use of a regular grammatical inference algorithm which is entirely data driven ensuring that it can be applied to any language provided syllable labelled data is available. The approach allows for a rapid and low cost development of phonotactic resources for any language under observation. This makes it an attractive approach for developing phonotactic resources for lesser studied languages in the case where syllable labelled data for a language is available but language specific information required for hand constructing a phonotactics may not. Given that syllable labelled data for a language may not always be available a semi-automatic approach to acquiring a syllable phonotactics from phoneme labelled data without syllable boundaries is also discussed.
منابع مشابه
Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms
We investigate a variety of methods for improving language recognition accuracy based on techniques in speech recognition, and in some cases borrowed from speaker recognition. First, we look at the question of language-dependent versus language-independent phone recognition for phonotactic (PRLM) language recognizers, and find that language-independent recognizers give superior performance in b...
متن کاملTowards High Performance Phonotactic Feature for Spoken Language Recognition
With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...
متن کاملComparing different model configurations for language identification using a phonotactic approach
In this paper different model configurations for language identification using a phonotactic approach are explored. Identification experiments were carried out on the 11-language telephone speech corpus OGI-TS, containing calls in French, English, German, Spanish, Japanese, Korean, Mandarin, Tamil, Farsi, Hindi, and Vietnamese. Phone sequences output by one or multiple phone recognizers are res...
متن کاملModeling code-Switching speech on under-resourced languages for language identification
This paper presents an integration of phonotactic information to perform language identification (LID) in a mixed-language speech. A single-pass front-end recognition system is employed to convert the spoken utterances into a statistical occurrence of phone sequences. To process such phone sequences, a hidden Markov model (HMM) is utilized to build robust acoustic models that can handle multipl...
متن کاملParallel Acoustic Model Adaptation for Improving Phonotactic Language Recognition
In phonotactic language recognition systems, the use of acoustic model adaptation prior to phone lattice decoding has been proposed to deal with the mismatch between training and test conditions. In this paper, a novel approach using diversified phonotactic features from parallel acoustic model adaptation is proposed. Specifically, the parallel model adaptation involves independent mean-only an...
متن کامل